Random Forest for the Contextual Bandit Problem
نویسندگان
چکیده
To address the contextual bandit problem, we propose an online random forest algorithm. The analysis of the proposed algorithm is based on the sample complexity needed to find the optimal decision stump. Then, the decision stumps are recursively stacked in a random collection of decision trees, BANDIT FOREST. We show that the proposed algorithm is optimal up to logarithmic factors. The dependence of the sample complexity upon the number of contextual variables is logarithmic. The computational cost of the proposed algorithm with respect to the time horizon is linear. These analytical results allow the proposed algorithm to be efficient in real applications, where the number of events to process is huge, and where we expect that some contextual variables, chosen from a large set, have potentially non-linear dependencies with the rewards. In the experiments done to illustrate the theoretical analysis, BANDIT FOREST obtain promising results in comparison with state-of-the-art algorithms.
منابع مشابه
Bandit Forest
To address the contextual bandit problem, we propose online decision tree algorithms. The analysis of proposed algorithms is based on the sample complexity needed to find the optimal decision stump. Then, the decision stumps are assembled in a decision tree, Bandit Tree, and in a random collection of decision trees, Bandit Forest. We show that the proposed algorithms are optimal up to a logarit...
متن کاملNonparametric Contextual Bandit Optimization via Random Approximation
We examine the stochastic contextual bandit problem in a novel continuous-action setting where the policy lies in a reproducing kernel Hilbert space (RKHS). This provides a framework to handle continuous policy and action spaces in a tractable manner while retaining polynomial regret bounds, in contrast with much prior work in the continuous setting. We extend an optimization perspective that h...
متن کاملMulti-objective Contextual Multi-armed Bandit Problem with a Dominant Objective
In this paper, we propose a new multi-objective contextual multi-armed bandit (MAB) problem with two objectives, where one of the objectives dominates the other objective. Unlike single-objective MAB problems in which the learner obtains a random scalar reward for each arm it selects, in the proposed problem, the learner obtains a random reward vector, where each component of the reward vector ...
متن کاملMachine Learning Approaches for Interactive Verification
Interactive verification is a new problem, which is closely related to active learning, but aims to query as many positive instances as possible within some limited query budget. We point out the similarity between interactive verification and another machine learning problem called contextual bandit. The similarity allows us to design interactive verification approaches from existing contextua...
متن کاملLinear Bayes policy for learning in contextual-bandits
Machine and Statistical Learning techniques are used in almost all online advertisement systems. The problem of discovering which content is more demanded (e.g. receive more clicks) can be modeled as a multi-armed bandit problem. Contextual bandits (i.e. bandits with covariates, side information or associative reinforcement learning) associate, to each specific content, several features that de...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016